A continuación se presenta el cuaderno con los graficos creados para el 30DayChartChallenge propuesto como actividad en la asignatura de Visualizacion de Datos del master de Big Data de la universidad internacional de Valencia
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
#Importación de los datos#setwd("/Users/MuzDog/Desktop/MasterBigData/0_Estadistica_Adv/Actividad_2")data <-read_csv("/Users/MuzDog/Desktop/MasterBigData/Estadistica_Adv/Actividad_2/export (1).csv")
Rows: 3219 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (8): tavg, tmin, tmax, prcp, snow, wdir, wspd, pres
lgl (2): wpgt, tsun
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data$date <-as.Date(data$date, format="%Y-%m")#Preprocesamiento (Limpieza filas y columnas vacias):data_clean <- data %>%drop_na(tavg, tmin, tmax) # Limpieza filas vaciasdata_clean <- data_clean %>%select(-snow, -wdir, -wspd, -wpgt, -pres, -tsun) # Limpieza columnas vaciasdata_clean$AñoMes <-format(data_clean$date, "%Y-%m") # Crear una nueva columna que solo tenga el año y el mesdata_mensual <- data_clean %>%# Calcular la temperatura media para cada mesgroup_by(AñoMes) %>%summarise(Temp_Media_Mensual =mean(tavg, na.rm =TRUE), Temp_Minima_Mensual =mean(tmin, na.rm =TRUE), Temp_Maxima_Mensual =mean(tmax, na.rm =TRUE))data_mensual$AñoMes <-as.Date(paste0(data_mensual$AñoMes, "-01"))# Visualizacion inicialplot1 <-ggplot(data_clean, aes(x = date)) +geom_line(aes(y = tmax, color ="Tmax")) +geom_line(aes(y = tmin, color ="Tmin")) +geom_line(aes(y = tavg, color ="Tavg")) +labs(title ="Variables Meteorológicas en Madrid (2015-2024)",x ="Fecha", y ="Valor") +scale_color_manual(values =c("Tmax"="red", "Tmin"="green", "Tavg"="yellow")) +theme_minimal() +theme(plot.title =element_text(hjust =0.5))#Makeover# Crear eñ cpnjunto de datos a partir del anterior.(Paso innecesario pero me da pereza cambiar todo)x <- data_clean$datey1 <- data_clean$tminy2 <- data_clean$tavgy3 <- data_clean$tmaxdata <-data.frame(x, y1, y2, y3)# Encontrar el valor mínimo y sus coordenadasmin_tmin <-min(data$y1, na.rm =TRUE)min_tmin_x <- data$x[which.min(data$y1)]# Crear el gráficoplot2 <-ggplot() +geom_line(aes(x = x, y = y1, color ="T Minima (ºC)"), data = data, linewidth =0.7) +geom_line(aes(x = x, y = y2, color ="T Media (ºC)"), data = data, linewidth =0.7) +geom_line(aes(x = x, y = y3, color ="T Maxima (ºC)"), data = data, linewidth =0.7) +geom_line(aes(x = AñoMes, y = Temp_Minima_Mensual, color ="blue"), data = data_mensual, linewidth =0.1) +geom_line(aes(x = AñoMes, y = Temp_Media_Mensual, color ="T Media Mensual (ºC)"), data = data_mensual, linewidth =0.7) +geom_line(aes(x = AñoMes, y = Temp_Maxima_Mensual, color ="darkmagenta"), data = data_mensual, linewidth =0.1) +# Colorear el área entre y1 y y2geom_ribbon(aes(x = AñoMes, ymin =pmin(Temp_Minima_Mensual, Temp_Media_Mensual), ymax =pmax(Temp_Minima_Mensual, Temp_Media_Mensual), fill ="Área T Min - T Media"), alpha =0.5 , data = data_mensual) +# Colorear el área entre y2 y y3geom_ribbon(aes(x = AñoMes, ymin =pmin(Temp_Media_Mensual, Temp_Maxima_Mensual), ymax =pmax(Temp_Media_Mensual, Temp_Maxima_Mensual), fill ="Área T Media - T Max"), alpha =0.5, data = data_mensual) +labs(title ="Variación de la temperatura en Madrid (2015-2024)", subtitle =" Linea de temperaturas diarias para visualizar cambios puntuales \ncon areas representando temperaturas mensuales medias",x ="Tiempo en años", y ="Temperatura") +# Leyenda para las áreasscale_fill_manual(name ="Áreas", values =c("Área T Min - T Media"="blue", "Área T Media - T Max"="darkmagenta")) +# Leyenda para las líneasscale_color_manual(name ="Líneas", values =c("T Minima (ºC)"="green", "T Media (ºC)"="yellow", "T Maxima (ºC)"="red","T Media Mensual (ºC)"="black")) +scale_x_date(date_breaks ="1 year", date_labels ="%Y", minor_breaks ="3 months") +scale_y_continuous(breaks =seq((floor(min(data$y1) /10) *10-10), (ceiling(max(data$y3) /10) *10+10), by =10)) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold"), plot.subtitle =element_text(hjust =0.5), plot.caption =element_text(hjust =0, face ="italic") # Estilo del comentario ) +# Añadir el comentario debajo del gráficolabs(caption ="Autor: Mario Salvador López Muñoz | Datos: Meteostat") +guides(color =guide_legend(override.aes =list(size =5))) +geom_segment(aes(x = min_tmin_x, y = min_tmin +2, xend = min_tmin_x +100, yend = min_tmin +2.5), color ="black") +# Línea hacia arriba# Añadir etiqueta "Filomena" en un recuadrogeom_label(aes(x = min_tmin_x, y = min_tmin, label ="Filomena"), fill ="white", color ="black", size =4, label.size =0.5, # Grosor del bordehjust =-0.2, vjust =-1)grid.arrange(plot1, plot2, ncol =1)
Gráfico 4
Dia 4: Waffle
# Cargar las librerías necesariaslibrary(ggplot2)library(RColorBrewer)library(waffle)# Datos para la subdivisión de Chordatadata1 <-data.frame(Categoria =c('Peces de espinas', 'Anfibios', 'Aves', 'Mamíferos', 'Reptiles', 'Peces cartilaginosos'),Cantidad =c(159, 84, 83, 37, 32, 1))# Convertir las clases en una forma adecuadadata1$Celdas <-round(data1$Cantidad /sum(data1$Cantidad) *100)data1$Celdas[data1$Categoria =="Peces cartilaginosos"] <- data1$Celdas[data1$Categoria =="Peces cartilaginosos"] +1# Usar la paleta Set1 de RColorBrewer para los colorescolores_set1 <-c("#4DAF4A", "#FF7F00", "#E41A1C", "#984EA3", "#377EB8", "#A65628", "#984EA3", "#FF69B4")# Crear el waffle chart con colores de la paleta Set1waffle_subdivision <-ggplot(data1, aes(fill = Categoria, values = Celdas)) +geom_waffle(n_rows =10, color ="white", radius = grid::unit(0.2, "npc"), width =0.97, height =0.97) +# Definir filas y color de bordesscale_fill_manual(values = colores_set1) +# Usar los colores de Set1labs(title ="Distribución de animales extintos", x ="Cada cuadrado representa 1%", fill ="Clase") +# Cambiar el nombre de la leyendatheme_minimal() +theme(plot.title =element_text(hjust =0.5, vjust =-5, face ="bold", size =16), plot.subtitle =element_text(hjust =0.5), plot.caption =element_text(hjust =0.2, vjust =10, face ="italic"),legend.position ="right", # Posición de la leyendaaxis.title.y =element_blank(), # Eliminar el label del eje Yaxis.title.x =element_text(vjust =8), #Mueve la posicion del titulo del eje xaxis.text =element_blank(),axis.ticks =element_blank(),panel.grid =element_blank()) +labs(caption ="Autor: Mario Salvador López Muñoz | Datos: IUCN Red List of Threatened Species. Version 2024-1")print(waffle_subdivision)
Gráfico 5
Dia 5 : Diverging
library(ggplot2)oecd_data <-data.frame(paises =c('Alemania', 'Arabia Saudita', 'Argentina', 'Australia', 'Brasil', 'Canadá', 'China', 'Corea del Sur', 'Estados Unidos', 'Francia', 'India', 'Indonesia', 'Italia', 'Japón', 'México', 'Reino Unido', 'Rusia', 'Sudáfrica', 'Türkiye'),pib_percapita =c(49520, 30686, 12959, 64440, 9899, 49419, 11657, 30594, 76864, 41330, 2338, 4612, 36070, 31111, 13070, 46619, 12922, 5764, 12080),origen_datos ="https://datosmacro.expansion.com/paises/grupos/g20")# Calcular la media del PIB per cápitamedia_pib <-mean(oecd_data$pib_percapita)# Crear la columna 'pib_percapita_respecto' con respecto a la mediaoecd_data$pib_percapita_respecto <- oecd_data$pib_percapita - media_pib# Ordenar los países por PIB per cápitaoecd_data <- oecd_data[order(oecd_data$pib_percapita), ]#Graficoggplot(oecd_data, aes(x =reorder(paises, pib_percapita), y = pib_percapita -28500, fill = pib_percapita >28500)) +geom_bar(stat ="identity", width =0.7) +coord_flip() +scale_y_continuous(breaks =seq(-13500, 50000, by =13500), labels =function(x) x +28500) +scale_fill_manual(values =c("#FC4E2A", "#1A9850"), labels =c("Negativo", "Positivo")) +labs(title ="Diferencias en el PIB per capita de paises del G20",subtitle ="La media del PIB per capita es 28.500€",x ="Paises",y ="Diferencia ( €)",fill ="Crecimiento" ) +theme_minimal() +theme(legend.position ="bottom",plot.title =element_text(hjust =0.3, face ="bold", size =16), plot.subtitle =element_text(hjust =0.35), plot.caption =element_text(hjust =-0.3, vjust =2, face ="italic") ) +labs(caption ="Autor: Mario Salvador López Muñoz | Datos: https://datosmacro.expansion.com/paises/grupos/g20")
import pandas as pdimport matplotlib.pyplot as pltfrom matplotlib.ticker import AutoMinorLocatordf = pd.read_csv('~/Desktop/MasterBigData/0_Visualizacion/Actividad/Datos/Road Accident Data.csv')df['Accident_Severity'].replace('Fetal', 'Fatal', inplace=True)
<string>:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
df_filtrado = df[['Day_of_Week', 'Accident_Severity']]df_grouped = df_filtrado.groupby(['Day_of_Week', 'Accident_Severity']).size().unstack(fill_value=0)df_grouped_percentage = df_grouped.div(df_grouped.sum(axis=1), axis=0) *100df_grouped_percentage.index = ['Lunes', 'Martes', 'Miércoles', 'Jueves', 'Viernes', 'Sabado', 'Domingo']df_grouped_percentage.columns = ['Muy Grave', 'Severo', 'Leve']severity_order = ['Leve', 'Severo', 'Muy Grave']colors = ['#6B8E23', '#FF8C00', '#ff0000']df_grouped_percentage = df_grouped_percentage[severity_order]ax = df_grouped_percentage.plot(kind='bar', stacked=True, figsize=(10, 6), color=colors)#Yo diria que en este caso si que se puede hacer zoom, ya que estamos viendo las diferencias que hay entre las columnas. Por debajo de 75% es innecesario y no lleva a confusiónplt.ylim(60, 105)
(60.0, 105.0)
ax.yaxis.grid(True)ax.yaxis.set_minor_locator(AutoMinorLocator(n=3))ax.grid(which='minor', linestyle=':', linewidth='0.5', color='gray') plt.xlabel('Dia de la Semana')plt.ylabel('Accidentes en porcentaje')plt.title('Severidad de Accidentes de trafico por Dia de la semana')plt.legend(title='Severity', bbox_to_anchor=(1.05, 1), loc='upper left')plt.tight_layout() # Ajusta automáticamente para optimizar el espacioplt.show()
([<matplotlib.axis.XTick object at 0x12ffcf200>, <matplotlib.axis.XTick object at 0x12df3d130>, <matplotlib.axis.XTick object at 0x1389aa3c0>, <matplotlib.axis.XTick object at 0x12ffcdee0>, <matplotlib.axis.XTick object at 0x12ffba240>], [Text(0, 0, 'Education'), Text(1, 0, 'Entertainment'), Text(2, 0, 'Games'), Text(3, 0, 'Others'), Text(4, 0, 'Photo & Video')])
plt.legend()plt.title("Distribución de Aplicaciones Gratis y de Pago por Categoría")plt.ylabel("Numero de Aplicaciones")plt.figtext(0.5, -0.05, "Gráfico: Mario Salvador López Muñoz - Fuente: Editado a partir de original en Kaggle",wrap=True, horizontalalignment='center', fontsize=10, style='italic', color='gray')plt.show()
========================================
circlize version 0.4.16
CRAN page: https://cran.r-project.org/package=circlize
Github page: https://github.com/jokergoo/circlize
Documentation: https://jokergoo.github.io/circlize_book/book/
If you use it in published research, please cite:
Gu, Z. circlize implements and enhances circular visualization
in R. Bioinformatics 2014.
This message can be suppressed by:
suppressPackageStartupMessages(library(circlize))
========================================
df <-read.csv("https://raw.githubusercontent.com/apalbright/Friends/master/raw_data/friendsdata.csv")friends <-combn(c("Chandler", "Joey", "Monica", "Phoebe", "Rachel", "Ross"), 2)relations <-data.frame(from = friends[1, ], to = friends[2, ])total <-list()dyads <-c("12", "13", "14", "15", "16", "23", "24", "25", "26", "34", "35", "36", "45", "46", "56")for (x in dyads) { total[x] =nrow(df[df$dynamics == x, ])}relations["Total"] =as.numeric(unlist(total))colors <-c("Chandler"="tomato", "Joey"="goldenrod", "Monica"="steelblue", "Phoebe"="slateblue", "Rachel"="orchid", "Ross"="mediumseagreen")chordDiagram(relations, grid.col = colors)title("Relaciones entre los personajes de Friends")mtext("Autor: Mario Salvador López Muñoz | Datos: https://github.com/apalbright/Friends", side =1, line =4, adj =1, cex =0.7, col ="gray50")
Gráfico 14
Dia 14: heatmap
library(grid)library(pheatmap)# Cargar el dataset y crear la matrizx <-data.matrix(UScitiesD, rownames.force =TRUE)# Crear el heatmappheatmap( x,color =heat.colors(256),fontsize_row =10, # Tamaño de las etiquetas de las filasfontsize_col =10, # Tamaño de las etiquetas de las columnasangle_col =45, # Inclinación de las etiquetas de las columnascellwidth =20, cellheight =20, # Ajuste de tamaño de las celdasborder_color =NA, # Sin bordes para una apariencia más claradisplay_numbers =FALSE, # Opcional: muestra o no los valores numéricoscluster_cols =FALSE)# Agregar el título y el subtítulo usando grid.textgrid.text("Distancias entre Ciudades de EE. UU. expresadas en millas", y =unit(1, "npc") -unit(1, "lines"), gp =gpar(fontsize =14, fontface ="bold"))grid.text("Autor: Mario Salvador López Muñoz | Datos: UScitiesD (data de R)", y =0.05, gp =gpar(fontsize =10))
Gráfico 15
Dia 15: historical
wars_data <-data.frame(war =c("Guerra de los Treinta Años", "Guerra de Sucesión Española", "Guerras Napoleónicas", "Primera Guerra Mundial", "Segunda Guerra Mundial", "Guerra de los Siete Años", "Guerra Franco-Prusiana", "Guerra Civil Española", "Guerra de Bosnia", "Guerra de Kosovo", "Guerra de Crimea", "Guerra de los Diez Años", "Guerra de los Balcanes", "Guerra de Irak", "Guerra de Ucrania"),start_year =c(1618, 1701, 1803, 1914, 1939, 1756, 1870, 1936, 1992, 1998, 1853, 1947, 1912, 2003, 2022),end_year =c(1648, 1714, 1815, 1918, 1945, 1763, 1871, 1939, 1995, 1999, 1856, 1954, 1913, 2011, NA) # 'NA' para conflictos en curso)library(ggplot2)# Calcular la duración de cada guerrawars_data$duration <- wars_data$end_year - wars_data$start_year# Crear el gráficoggplot(wars_data, aes(x = start_year, y = duration, label = war)) +geom_segment(aes(xend = end_year, yend = duration), size =3, color ="blue") +geom_text(aes(y = duration, label = war), vjust =-0.5) +labs(title ="Duración de Guerras en Europa",x ="Año",y ="Duración (años)") +theme_minimal()
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_segment()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_text()`).
Gráfico 16
Dia 16: weather
library(dplyr)library(lubridate)# Crear un dataset de ejemploset.seed(123)time_data <-data.frame(fecha =seq(ymd("2020-01-01"), ymd("2023-01-01"), by ="month"),temperatura =rnorm(37, mean =15, sd =5), # Temperatura aleatoriaprecipitacion =rnorm(37, mean =50, sd =20) # Precipitación aleatoria)library(ggplot2)# Gráfico de líneas para temperatura y precipitaciónggplot(time_data, aes(x = fecha)) +geom_line(aes(y = temperatura, color ="Temperatura (°C)"), size =1) +geom_line(aes(y = precipitacion /2, color ="Precipitación (mm)"), size =1) +# Ajustar escalascale_y_continuous(sec.axis =sec_axis(~ .*2, name ="Precipitación (mm)")) +# Eje secundariolabs(title ="Variación de Temperatura y Precipitación a lo Largo del Tiempo",x ="Fecha",y ="Temperatura (°C)",color ="Variables") +theme_minimal()
Gráfico 17
Dia 17: networks
library(igraph)
Attaching package: 'igraph'
The following object is masked from 'package:circlize':
degree
The following objects are masked from 'package:lubridate':
%--%, union
The following objects are masked from 'package:purrr':
compose, simplify
The following object is masked from 'package:tidyr':
crossing
The following object is masked from 'package:tibble':
as_data_frame
The following objects are masked from 'package:dplyr':
as_data_frame, groups, union
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
library(networkD3)library(RColorBrewer)# Cargar el datasetdf <-read.csv('Datos/Family Guy Dataset.csv')# Definir el diccionario de normalizaciónnormalization_dict <-list("Peter"="Peter Griffin","Brian"="Brian","Chris"="Chris","Stewie"="Stewie","Lois"="Lois","Meg"="Meg","Cleveland"="Cleveland","Quagmire"="Quagmire","Joe"="Joe Swanson","Carter"="Carter Pewterschmidt","Mayor Adam West"="Mayor Adam West","Chicken"="The Giant Chicken","Herbert"="Herbert","Mort Goldman"="Mort Goldman")# Función para normalizar nombres de personajesnormalize_characters <-function(characters, norm_dict) { normalized <-sapply(characters, function(character) { character <-trimws(character)for (key innames(norm_dict)) {if (grepl(key, character, fixed =TRUE)) {return(norm_dict[[key]]) } }return(character) })return(normalized)}# Inicializar un data frame vacío para almacenar las conexionesconnections_df <-data.frame(from =character(), to =character(), peso =integer(), stringsAsFactors =FALSE)# Crear el DataFrame de conexionesfor (i in1:nrow(df)) {if (!is.na(df$Featuring[i])) { characters <-unlist(strsplit(as.character(df$Featuring[i]), ",")) normalized_characters <-normalize_characters(characters, normalization_dict)if (length(normalized_characters) >1) { # Solo procesar si hay más de un personaje combs <-combn(normalized_characters, 2)for (j in1:ncol(combs)) { connections_df <-rbind(connections_df, data.frame(from = combs[1, j], to = combs[2, j], peso =1)) } } }}# Contar las conexioneslinks <-aggregate(peso ~ from + to, data = connections_df, FUN = sum)# Crear la rednetwork <-graph_from_data_frame(d = links, directed =FALSE)# Contar el grado de cada nododeg <-degree(network, mode ="all")# Filtrar nodos para mantener solo los 20 más conectadostop_nodes <-names(sort(deg, decreasing =TRUE)[1:15])filtered_links <- links[links$from %in% top_nodes & links$to %in% top_nodes,]#Cambiamos el color de los links. Normalizamos valores, escojemos paleta aplicamos coloresfiltered_links <- filtered_links %>%mutate(normalized_value = (peso -min(peso)) / (max(peso) -min(peso)))palette <-colorRampPalette(brewer.pal(9, "Blues"))filtered_links <- filtered_links %>%mutate(linkColor =palette(100)[as.numeric(cut(normalized_value, breaks =100))])p <-simpleNetwork(filtered_links, height ="500px", width ="700px", Source =1, Target =2,linkDistance =300, charge =-200, fontSize =20, fontFamily ="Arial", linkColour = filtered_links$linkColor, nodeColour ="#FFA500", opacity =0.9,zoom =TRUE)p
Gráfico 18
Dia 18: Asian Development Bank (data day)
#No veas el curro que tiene esto... Y despues para que la unica diferencia es poder poner titulo... Aun se pueden mejorar algunas cosas, pero voy a seguir con otros graficos...library(tidyverse)library(treemap)library(ggfittext)library(scales)
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr (8): ID, Season, Province, District, Date, Land, Description, Stage
dbl (1): Code
num (2): Latitude, Longitude
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data_ggplot <- data_tree[["tm"]] %>%as_tibble() %>%arrange(desc(vSize)) %>%mutate(rank =row_number(),xmax = x0 + w,ymax = y0 + h )data_ggplot_2 <- data_ggplot %>%group_by(Stage) %>%filter(vSize ==max(vSize)) %>%ungroup() data_ggplot_3 <- data_ggplot %>%filter(!is.na(Description)) %>%# Filtrar filas donde Description no es NAmutate(Description =as.character(Description))%>%ungroup() %>%mutate(size =3*w * (10/nchar(Description)))min_size <-1.5# Define el tamaño mínimo deseadop1 <-ggplot() +# Inicializa ggplot sin un dataframegeom_rect(data = data_ggplot_2, # Primer conjunto de datosaes(xmin = x0,ymin = y0,xmax = xmax,ymax = ymax,fill = color),size =2,colour ="#1E1D23",alpha =0.5) +# Transparencia para el primer conjuntogeom_text(data = data_ggplot_2, # Añadir texto del primer conjunto de datosaes(x = (x0 + xmax) /2,y = (y0 + ymax) /2,label = Stage), color ="black", size =3, fontface ="bold") +# Añadir el segundo geom_rect con un segundo conjunto de datosgeom_rect(data = data_ggplot_3, # Segundo conjunto de datosaes(xmin = x0,ymin = y0,xmax = xmax,ymax = ymax,fill = color),size =0.5,colour ="#1E1D23",alpha =0.3) +# Transparencia para el segundo conjuntogeom_text(data = data_ggplot_3, # Añadir texto del segundo conjunto de datosaes(x = (x0 + xmax) /2,y = (y0 + ymax) /2,size = size,label = Description), color ="black", fontface ="bold") +labs(title ="Cosechas de Pakistán",subtitle ="Tamaño del área equivale a la cantidad de plantaciones",caption ="Data: ADB - https://data.adb.org/dataset/ground-truthing-survey-data-crop-type-pakistan-rabi-2022-kharif-2023 | Design: Mario Salvador López") +theme_void() +theme(legend.position ="none") +theme(panel.background =element_rect(fill ="transparent", color =NA), # Fondo transparenteplot.background =element_rect(fill ="transparent", color =NA)) # Fondo transparente para todo el gráficop1
Semana 4: Timeseries
Gráfico 19
Dia 19: Dinosaurs
# Paquetes necesarios#Me he apoyado en este enlace para hacer el grafico: https://github.com/nrennie/30DayChartChallenge/blob/main/2024/ library(ggplot2)library(dplyr)library(showtext)
Loading required package: sysfonts
Loading required package: showtextdb
library(ggforce)library(tibble)# Datos de los eventos históricosevents <-data.frame(event =c("Mammals", "Dinosaurs", "Coal Swamps", "Land Plants","Trilobites", "Jellyfish", "Seaweeds", "Sexual\nReproduction", "Single-Celled Algae", "Iron-\nFormations", "Oldest Fossils", "Origin of Life", "Meteorite\nBombardment", "Formation of\nthe Earth"),time =c("23:39", "22:56", "22:24", "21:52", "21:04", "20:48", "20:28", "18:08", "14:08", "06:00", "05:36", "04:00", "03:00", "00:00:00"))# Función para convertir el tiempo en ángulostime_to_angle <-function(time_str) { parts <-as.numeric(unlist(strsplit(time_str, ":"))) hours <- parts[1] minutes <- parts[2] seconds <-ifelse(length(parts) ==3, parts[3], 0) total_seconds <- hours *3600+ minutes *60+ seconds angle <- (total_seconds / (24*3600)) *360return(angle)}# Convertir eventos a ángulos y agregar etiquetasevents <- events %>%mutate(angle =sapply(time, time_to_angle))events <- events %>%mutate(x =1.3*cos((90- angle) * pi /180), # Convertir ángulo a radianesy =1.3*sin((90- angle) * pi /180), # Convertir ángulo a radianesx_linea =1*cos((90- angle) * pi /180),y_linea =1*sin((90- angle) * pi /180) )# Datos del diseño de relojr <-1.1theta <-seq(0, (2* pi), length.out =13)[1:12]clock_data <-tibble(x = r *cos(theta),y = r *sin(theta),angle =90+360* (theta / (2* pi)),label =c("III", "II", "I", "XII", "XI", "X", "IX", "VIII", "VII", "VI", "V", "IV"))theta2 <-seq(0, (2* pi), length.out =61)[1:60]clock_data2 <-tibble(x = r *cos(theta2),y = r *sin(theta2))# Configuración de fuentes y coloresfont_add_google("Roboto", "roboto")font_add_google("Roboto Slab", "roboto_slab")showtext_auto()bg_col <-"#fafafa"text_col <-"grey10"body_font <-"roboto"title_font <-"roboto_slab"# Crear el gráficoggplot() +geom_point(data = clock_data2,mapping =aes(x = x, y = y),size =0.5,colour = text_col ) +geom_label(data = clock_data,mapping =aes(x = x, y = y, label = label),family = title_font,size =8,label.size =0,fill = bg_col,colour = text_col,fontface ="bold" ) +geom_label(data = events,mapping =aes(x = x, y = y, label = event),family = title_font,size =3,label.size =0,fill = bg_col,colour = text_col,fontface ="bold") +annotate("text", x =-0.00839848, y =1.499976,label ="Humanity", size =3,family = body_font, colour = text_col,fontface ="bold") +geom_segment(data = events,aes(x =0, y =0, xend = x_linea, yend = y_linea), color ="grey50", # Color de las líneaslinewidth =0.5# Grosor de las líneas ) +geom_segment(aes(x =0, y =0, xend =-7.839396e-03, yend =1.3999781),colour ="grey50", # Cambia el color de la línea según lo deseeslinewidth =0.5# Ajusta el grosor de la línea si es necesario ) +scale_fill_manual(values =c("gray80", "gray50", "gray20") ) +labs(title ="Historia de la Tierra",subtitle ="Si la historia de la tierra estuviese comprimda en 12 horas, la humanidad solo\n habria estado presente durante 39 segundos, los dinosaurios 30 minutos.",caption ="**Graphic**: Mario Salvador López Muñoz. Editado de: https://flowingdata.com/2012/10/09/history-of-earth-in-24-hour-clock/" ) +coord_fixed() +theme_void(base_size =24, base_family = body_font) +theme(legend.position ="none",plot.background =element_rect(fill = bg_col, colour = bg_col),panel.background =element_rect(fill = bg_col, colour = bg_col),plot.title =element_text(family = title_font, colour = text_col, size =12, face ="bold", hjust =0.5),plot.subtitle =element_text(family = body_font, colour = text_col, size =10, hjust =0.5),plot.caption =element_text(family = body_font, colour = text_col, size =8, hjust =0.5) )
Gráfico 20
Dia 20: correlation
import pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom scipy.spatial.distance import pdist, squareform# Cargar el conjunto de datosdataset = pd.read_csv("Datos/global-data-on-sustainable-energy (1).csv")# Seleccionar columnas específicasfiltered_dataset = dataset[["Entity", "Year", "Renewable-electricity-generating-capacity-per-capita"]]# Filtrar los datos para eliminar filas con valores NaNfiltered_dataset = filtered_dataset.dropna(subset=["Renewable-electricity-generating-capacity-per-capita"])train_transpuesto = filtered_dataset.transpose()train_transpuesto = filtered_dataset.pivot(index='Entity', columns='Year', values='Renewable-electricity-generating-capacity-per-capita')distance_matrix = pd.DataFrame(squareform(pdist(train_transpuesto, metric='euclidean')), index=train_transpuesto.index, columns=train_transpuesto.index)plt.figure(figsize=(10, 8))sns.heatmap(distance_matrix, cmap='coolwarm', annot=False)plt.title('Matriz de Distancia (Heatmap)')plt.show()
Rows: 3649 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Entity
dbl (19): Year, Access to electricity (% of population), Access to clean fue...
num (1): Density\n(P/Km2)
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
filtered_green_data <- green_data %>%filter(!is.na(green_data$`Renewable-electricity-generating-capacity-per-capita`)) %>%select(Entity, Year, 'Renewable-electricity-generating-capacity-per-capita')total_capacity <- filtered_green_data %>%group_by(Entity) %>%summarise(total_renewable_capacity =sum(`Renewable-electricity-generating-capacity-per-capita`, na.rm =TRUE)) %>%arrange(desc(total_renewable_capacity)) # Ordenar de mayor a menortop_7_entities <- total_capacity %>%slice_head(n =7)final_green_data <- filtered_green_data %>%filter(Entity %in% top_7_entities$Entity)# Crear el gráfico de líneasline_plot <-ggplot(final_green_data, aes(x = Year, y =`Renewable-electricity-generating-capacity-per-capita`, color = Entity, group = Entity)) +geom_line(size =1) +# Dibujar líneasgeom_text_repel(data = final_green_data %>%group_by(Entity) %>%summarise(last_year =max(Year), last_value =last(`Renewable-electricity-generating-capacity-per-capita`)),aes(x = last_year, y = last_value, label = Entity), vjust =0.5,hjust =0,nudge_x =3,size =3) +labs(title ="Paises con mayor Generación de Electricidad Renovable per cápita del mundo",x ="Año",y ="Capacidad Renovable de Electricidad por Persona (kW/hab)",caption ="**Graphic**: Mario Salvador López Muñoz. Datos: https://www.kaggle.com/datasets/anshtanwar/global-data-on-sustainable-energy") +theme_minimal() +theme(legend.position ="none")print(line_plot)
Rows: 189 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr (1): Entity
dbl (2): Year, International tourist arrivals by region
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tourist_data_wide <- tourist_data %>%pivot_wider(names_from = Year, # Cada año se convierte en una columnavalues_from =`International tourist arrivals by region`# Los valores de cada año )write.csv(tourist_data_wide, "Datos/tourist_data_ready.csv", row.names =FALSE)
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): ref_area.label, source.label, indicator.label, sex.label, classif1....
dbl (2): time, obs_value
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
filtered_pobreza <- pobreza %>%filter(sex.label =="Sex: Total") %>%filter(classif1.label =="Age (Youth, adults): 15+") %>%select(ref_area.label, sex.label, time, obs_value)filtered_pobreza_2 <- filtered_pobreza %>%filter(!grepl("Upper-middle|Lower-middle|Low income|High income|World|Asia|America|Southern|Northern|Eastern|Western|APEC|G20|BRICS|ASEAN|MENA|CARICOM", ref_area.label))pobreza_2001 <- filtered_pobreza_2 %>%filter(time ==2001) %>%select(ref_area.label, obs_value) %>%arrange(obs_value) # Ordena de mayor a menorpaises_ordenados <- pobreza_2001$ref_area.labelordered_pobreza <- filtered_pobreza_2 %>%mutate(ref_area.label =factor(ref_area.label, levels = paises_ordenados)) %>%arrange(ref_area.label) # Ahora ordena el dataframe original según el nuevo factorlista_paises <-unique(filtered_pobreza_2$ref_area.label)
library(viridis)library(ggExtra)p <-ggplot(ordered_pobreza, aes(time,ref_area.label,fill=obs_value)) +geom_tile(color="white",size=0.1) +scale_fill_viridis(name="Porcentaje",option ="C")#p <-p + facet_grid(year~month)p <-p +scale_y_discrete(breaks =unique(filtered_pobreza$ref_area.label))p <-p +scale_x_continuous()p <-p +theme_minimal(base_size =8)p <-p +labs(title=paste("Porcentaje de la poblacion trabajadora que vive\n con menos de 2.15$ al dia"), x="Año", y="")p <-p +theme(legend.position ="bottom")+theme(plot.title=element_text(size =14))+theme(axis.text.y=element_text(size=6)) +theme(strip.background =element_rect(colour="white"))+theme(plot.title=element_text(hjust=0))+theme(axis.ticks=element_blank())+theme(axis.text=element_text(size=7))+theme(legend.title=element_text(size=8))+theme(legend.text=element_text(size=6))+labs(caption ="Graphic: Mario Salvador López Muñoz. Datos: https://ilostat.ilo.org/topics/working-poverty/") +removeGrid()#ggExtrap
Semana 5: Uncertainties
Gráfico 25
Dia 25: global change
#Importación de los datosdata <-read_csv("Datos/export_Datos_Clima_Madrid.csv")
Rows: 3318 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (8): tavg, tmin, tmax, prcp, snow, wdir, wspd, pres
lgl (2): wpgt, tsun
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
3.5.0.
ℹ Please use the `legend.position.inside` argument of `theme()` instead.
radar <- radar +labs(title ="Mejores amigos en blanco y negro") +theme(plot.background =element_rect(fill ="#fbf9f4", color ="#fbf9f4"),panel.background =element_rect(fill ="#fbf9f4", color ="#fbf9f4"),plot.title.position ="plot", # slightly different from defaultplot.title =element_text(family ="sans", size =20,face ="bold", color ="#000000" ) )print(radar)
Rows: 29491 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): candidate, state, party
dbl (5): pct_trend_adjusted, cycle, pct_estimate, hi, lo
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
president <- president %>%filter(cycle ==2020) %>%select(candidate, date, state, pct_trend_adjusted)president <- president %>%group_by(state) %>%slice_max(order_by = pct_trend_adjusted, n =1, with_ties =FALSE) %>%ungroup() %>%mutate(pct_trend_adjusted =round(pct_trend_adjusted)) %>%ungroup()write.csv(president, "president_data.csv", row.names =FALSE)#Datos de poblacion obtenidos de wikipedia y editados con excel